Final Presentation

Ian Mc Farlane, Bram Stults

2025-04-16

Introduction

Stakeholder

Mission

Context

This analysis is subsequent to work of Dr. Paul Maglione : ‘Convergence of cytokine dysregulation and antibody deficiency in common variable immunodeficiency with inflammatory complications’ :

Local and Global Impact

We hope our analysis will contribute as support system and a viewpoint on :

Locally

  • Building a code repository that is useful both as a tool and as evidence supporting publishable results
  • Supporting research of Drs. W. Evan Johnson of Rutgers and Paul J. Maglione of Boston U.

Globally

  • Expanding understanding of less understood immunological topic, contributing to body of evidence

  • Supporting the immunological research community and hopefully providing a data point one day improving care

Continuing Professional Development

Data Science Lifecycle

1. Data acquisition and representativeness

Dr. Johnson provided three separate data sets directly. These were developed or measured by research teams at Boston University and Rutgers:

  1. Nasopharynx sampling of controls and CVID-diagnosed patients
  2. Peripheral Blood Mononuclear Cell samples (white blood cells found in bloodstream) for controls and CVID
  3. Nasal and Blood samples from patients living with Tuberculosis and controls

As with other genomic studies, this analysis was subject to the problem of “small N big P”

2. Data management

3. Data preparation and integration



4. Data analysis

\[ K_{ij} \thicksim NB(\mu_{ij}, \alpha_i) \]

5. Model development/deployment

Model development wasn’t a central concern of this project. While we used DESeq models, they were used for inference.

Theoretical model development might involve:

6. Communication of Knowledge Obtained from Data

Non-technical Discussion

Legal, professional, ethical, security, and social issues were discussed in depth:

  1. Samples are described with metadata for statistical analysis, but must be separately analyzed for discretion around patient anonymity.
  2. The social ethics of designations and sensitive usage were discussed: samples taken in Uganda involved patients living with HIV. Consideration of care and personal ethics and sensitivies is important in designating these subjects. Diagnosis of Tuberculosis may warrant similar sensitivities.
  3. Presently evolving landscape of clinical research and public funding were discussed with Dr. Johnson, including professional practices and current risks

Development Methodology

Methodology

  • Worked:
    • Version control and collaboration through GitHub
    • Regular meetings with our stake holder (Dr. Johnson)
    • Mostly organized file structure
  • Didn’t Work:
    • Lack of a consistent coding syntax resulting in shifting styles
    • Duplicate files among several folders

Teamwork

  • Worked:
    • Different skill-sets
    • Largely clear task division
    • Multi-channel communication
  • Didn’t Work:
    • Occasional time conflicts from multiple schedules
    • Skill-set differentials

Outside Classes

We relied on many courses central to and supportive of the Data Science degree track:

Math Courses

  • MATH 3700: General Statistical Knowledge
  • MATH 2170: Matrix Mastery and Introduction to Dimension Reduction
  • MATH 3150: Introduction to linear models and advanced statistics
  • MATH 3190: Advanced R and Rmd

Support Courses

  • ANLY 4100: Data Visualization
  • ANLY 4110: Advanced Data Visualization
  • CSCY 2400: Ethical Data Issues
  • CS 2420: Data Structures Basics
  • BIOL 3060: Genetics background

Conclusion

Stakeholder Communication:

  • Gained experience in presenting progress, gathering feedback, and making changes to our work

New Industry & Tools:

  • Developed familiarity with common bioinformatics workflows and the unique challenges of clinical genomic data

  • Gained hands-on experience with tools like DESeq2, enrichR, and SummarizedExperiment

Impact:

  • Contributed to ongoing research of complex diseases like CVID, and demonstrated the potential of data science to inform clinical discovery

Thank you